Reed S E, Akata Z, Mohan S, et al. Learning what and where to draw[C]//Advances in Neural Information Processing Systems. 2016: 217-225.
1. Overview
In this paper, it attempted to synthesis and meet two requirements
disentangle the semantic information from two modalities and generate new images from the combined semantics.
- realistic while matching the target text description
- maintain other image features that are irrelevant to the text description
1.1. Related Work
- deterministic networks
- VAE
- autoregression
- VAE
- GAN
2. Methods
2.1. Architecture
- CA technique from StackGAN
- residual. output image would retain similar structure of the source image
2.2. Adaptive Loss for Semantic Image Synthesis
- +. positive
- -. negative
2.3. Loss Function
2.4. Improving Image Feature Representation
- pretrained VGG of conv4
2.5. Visual-Semantic Text Embedding
- pair-wise ranking loss
3. Experiments
3.1. Details
- 0.0002 Adam with 0.5 momentum, decrease by 0.5
- batch size 64
- flipping, rotating, zooming, cropping